Depending where you live, you may have encountered βair quality alerts,β notifications by the authorities that air pollution has reached a level considered unhealthy or even hazardous. In this activity, you will explore air quality data from Denver, Colorado from late 2022 to mid-2024.
An air quality monitor is a device using techologies like laser scattering to count small particles in the air passing through. They are sometimes connected to the internet. The data we will use was collected by such a device. Each hour over the time interval covered, a record was made of counts of particles of various sizes: 0.3 \(\mu\)m, 2.5 \(\mu\)m, and 10 \(\mu\)m. In addition, other meteorlogical data was recorded, such as air temperature and air pressure.
Here are a few rows of the data, covering just 48 hours.
You can see that the first record was made on October 17, 2022 at 19:00Z. (The Z refers to the time zone.) The second was made an hour later, and so on. There are 12,745 rows altogether.
Temperature, pressure, humidity
Letβs look at the most common measures of weather: temperature, humidity, and pressure. Itβs hard to look directly at the numbers recorded. A graphical display is easier to make sense of.
Outdoor temperature, humidity, and air pressure.
Play with the controls of the graph to zoom in and out and to see how to read the numerical values at any point in time.
The air pressure reading must be multiplied by ten to get the millibar reading. Typically, air pressure at sea level is around 1010 millibar. Why is the Denver reading different?
To help you keep track of the days, a purple oscillating line has been added to the graph with a period of 24 hours.
Zoom in so that the graph shows 20-30 days. A βdiurnalβ variation refers to changes that go up and down on a daily basis. By looking at the traces, can you see diurnal variation in air temperature? Humidity? Pressure?
Zoom out so that the graph shows roughly half a year. What does the diurnal variation look like at this scale?
There is a gap in the data. How long does the gap last? What are the starting and ending dates of the gap?
Particulate matter
- Times of very high pm2.5. Find an episode of high pm2.5 that lasts a day or more.
Background on AQI
See this EPA document. My conclusions from a quick read β¦
- Calculated separately for each pollutant.
- EPI gives a range of values for each category (e.g. unhealthy, hazardous)
- Iβm thinking that this is just a linear spline.
- For all the pollutants covered, take the one with the maximum API (not an average, etc.)
EXERCISES:
- Is there a significant correlation between temperature and pressure? Is it substantial.
- Find confidence interval on the slope.
- What do you think R2 will be?
These labels come from the US EPA
Thereβs a time stamp. Do you think it is in the Denver time zone? How would you tell?
Low temperature is consistently in hour 12. Ordinarily, one expects it to be in the very early morning, say 6am. So it looks like the time stamp is six hours later than the actual time in Denver. Look up Zulu time.
Simple statistics: min, max, median, mean, variance
Q: are visual_range and scattering_coefficient more or less telling the same story.
Q. Which of the counts is most strongly related to visual_range: 0.3, 2.5, 5.0, 10.5?
Rank statistics: Assign to each row itβs quantile in the AQI index.
Correlation: Which other variables correlate most strongly with X0.3_um_count?
Pattern finding: Pick two of the variables that you think would be telling and plot one versus the other, using the rank AQI for color. Can you see any patterns.
Explore the time series in an interactive graph. Does high particle count happen in a particular time of year? Find a date range where the count is above 11,000. Does the count at this times depend on the time of day?
Then we will move to point plots that compare two or more variables.
DenverAQI |> select(time_stamp, pm2.5, pm10, tod) |>
mutate(pm2.5 = rank(pm2.5), pm10 = rank(pm10),
tod = 10 + 10 * sin(2*pi*((tod-6) %% 24)/24)) |>
dygraph() |>
dyRangeSelector(dateWindow = c("2024-01-01", "2024-04-30")) |>
dyHighlight(highlightCircleSize = 5,
highlightSeriesBackgroundAlpha = 0.2,
hideOnMouseOut = FALSE)DenverAQI |> select(time_stamp, temperature, humidity, pressure, tod) |>
mutate(
pressure = pressure/10,
tod = 0 + 10 * sin(2*pi*((tod-6) %% 24)/24)) |>
dygraph() |>
dyRangeSelector(dateWindow = c("2024-01-01", "2024-04-30")) |>
dyHighlight(
highlightCircleSize = 5,
highlightSeriesBackgroundAlpha = 0.2,
hideOnMouseOut = FALSE)A high scattering_coefficient is associated with poor air quality. That is, haziness is a good sign of poor air quality.